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Symfoil-4P is a de novo protein exhibiting the threefold symmetrical /3-trefoil 
fold designed based on the human acidic fibroblast growth factor. First three 
asparagine-glycine sequences of Symfoil-4P are replaced with glutamine- 
glycine (Symfoil-QG) or serine-glycine (Symfoil-SG) sequences protecting from 
deamidation, and His-Symfoil-II was prepared by introducing a protease 
digestion site into Symfoil-QG so that Symfoil-II has three complete repeats 
after removal of the N-terminal histidine tag. The Symfoil-QG and SG and His- 
Symfoil-II proteins were expressed in Eschericha coli as soluble protein, and 
purified by nickel affinity chromatography. Symfoil-II was further purified by 
anion-exchange chromatography after removing the HisTag by proteolysis. Both 
Symfoil-QG and Symfoil-II were crystallized in 0.1 M Tris-HCl buffer (pH 7.0) 
containing 1.8 M ammonium sulfate as precipitant at 293 K; several crystal 
forms were observed for Symfoil-QG and II. The maximum diffraction of 
Symfoil-QG and II crystals were 1.5 and 1.1 A resolution, respectively. The 
Symfoil-II without histidine tag diffracted better than Symfoil-QG with N- 
terminal histidine tag. Although the crystal packing of Symfoil-II is slightly 
different from Symfoil-QG and other crystals of Symfoil derivatives having the 
N-terminal histidine tag, the refined crystal structure of Symfoil-II showed 
pseudo-threefold symmetry as expected from other Symfoils. Since the removal 
of the unstructured N-terminal histidine tag did not affect the threefold 
structure of Symfoil, the improvement of diffraction quality of Symfoil-II may 
be caused by molecular characteristics of Symfoil-II such as molecular stability. 
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1. Introduction 

Symmetry is one of the important thema in developing protein 
structure, function, evolution and design. Although complete 
structural symmetry is observed in many different natural 
proteins as homo-oligomerized architectures, structural 
pseudosymmetry is also observed in some monomeric 
proteins. These pseudosymmetric architectures are generally 
hypothesized as a result of gene duplication and fusion 
(Sepulveda et al, 1975; Tang et al, 1978; McLachlan, 1979; 
Inana et al, 1983). Two distinctly different evolutionary 
models for the emergence of symmetric protein architecture 
from a primordial peptide motif have been proposed 
(Mukhopadhyay, 2000; Ponting & Russell, 2000; Liu et al, 
2002; Yadid & Tawfik, 2007; Akanuma et al, 2010; Richter et 
al, 2010). 

In a previous report, we described an experimental top- 
down symmetric deconstruction (TDSD) of symmetric protein 
architecture (the ^-trefoil fold) using human fibroblast growth 



factor-1 (FGF-1), a 140 amino acid single-domain globular 
protein exhibiting characteristic threefold symmetry of the fi- 
trefoil architecture. The TDSD involved sequential introduc- 
tion of symmetric mutations (targeting core, reverse-turn and 
/S-strand secondary structure, respectively) until a purely 
threefold symmetric primary structure solution was achieved. 
Through this approach, we obtained a simplified /3-trefoil 
protein (Symfoil-4P) having a reduced amino acid alphabet 
size of 16 letters, and enriched in prebiotic amino acids (to 
71%) (Lee & Blaber, 2011; Longo et al, 2013). 

In order to obtain Symfoil with more complete symmetry, 
and greater chemical stability, we designed a monomeric 
protein (Symfoil-II) based on the Symfoil-4P protein (Lee & 
Blaber, 2011). In the sequence of Symfoil-II, three aspargine- 
glycine sequences were introduced to improve chemical 
stabilization from producing charge isomers by deamidation 
reaction. Furthermore, a protease digestion site was intro- 
duced to make the three repeats of Symfoil more complete 
after removing the N-terminal histidine tag. Here, we report 
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Table 1 

Primers used for site-directed mutagenesis. 



Mutation 


Name of primer 


Sequence 


N58S 


SG_Sitef_F 


AAGGCAGTGGTGAAGTTCTG 




SG Sitef R 


CTTCACCACTGCCTTCCGGG 


N58Q 


QG_Sitet_F 


CGGAAGGCCAGGGTGAAGTTCTG 




QG_Sitef_R 


CACCCTGGCCTTCCGGGGAG 


N100S 


SG Site2 F 


AGGGTAGCGGCGAGGTACTC 




SG_Site2_R 


CCTCGCCGCTACCCTCAGGG 


N100Q 


QG_Site2_F 


CTGAGGGTCAGGGCGAGGTACTC 




QG_Site2_R 


CGCCCTGACCCTCAGGGGAA 


Deletion 


Cdel_F 


GTCGACAAGCTTGCGGCCGCACTC- 
GAGCACCACCACCACCACCACTGA 




Cdel_R 


CGCAAGCTTGTCGACTTAGCCCTGT- 
CACTCTGGGCTAATCTGGAAT 


Symfoil-II 


N_delQG_F 


CCGCGCGGTCAAGGTGAAGTGCT- 
TCTTAAGAGCACTGAAACCGG- 
CCAG 




N_delQG_R 


ACCTTGACCGCGCGGCACCAGATG- 
GTGATGGTGATGGTGCATATGTA- 
TATC 



the crystal structure and characteristics of Symfoil-II. Symfoil- 
II with complete threefold axis may be useful as a scaffold that 
can capture small C3 symmetric compounds using the three- 
fold axis within the Symfoil-II protein. 



2. Materials and methods 

2.1. Site-directed mutagenesis 

To construct expression plasmids for the mutants, site- 
directed mutagenesis on the Symfoil-4P in pET-21a vector 
(Brych et al, 2001) was achieved by using polymerase chain 
reaction (PCR). PrimeStar Max DNA polymerase (Takara 
Bio) was used for the PCR. The PCR products were trans- 
fected into Escherichia coli HST08 strain without ligation 
(Takara Bio). Primers used for PCR are listed in Table 1. For 
creation of Symfoil-SG and QG, subcloning of the PCR 
product was repeated three times. In the first reaction, AsnlOO 
was replaced with serine or glutamine. In the secondary 
reaction, Asn58 was replaced with serine or glutamine. Finally, 
primers of Cdel_F and Cdel_R were used for deletion of 
C-terminal three amino acids. The resulting amino acid 
sequences of Symfoil-QG and Symfoil-SG are shown in Fig. 1. 
For preparation of the expression plasmid of His-Symfoil-II, 
the plasmid template of Symfoil-QG was amplified by using 
primers N_delQG_F and N_delQG_R as listed in Table 1. The 
DNA sequences of the coding region in all plasmids 
constructed here were confirmed by using ABI Prism 310 
DNA sequencer (Applied Biosystems). 

2.2. Expression and purification 

Synthetic polynucleotides coding Symfoil-SG, Symfoil-QG 
and His-Symfoil-II were expressed using the pET21a(+) 
plasmid/BL21(DE3) Escherichia coli host expression system 
(Merck). Expression of mutant Symfol proteins followed 
previously described procedures (Lee & Blaber, 2011). The 
cells were resuspended in buffer A [50 mM potassium phos- 
phate buffer (pH 7.5) containing 0.1 M NaCl] and sonicated. 



Symfoil-4P 1 

Symfoil-SG 1 

Symfoil-QG 1 

His-Symfoil-II 1 

Symfoil-II 1 



Symfoil-4P 15 
Symfoil-SG 15 
Symfoil-QG 15 
His-Symfoil-II 12 
Symfoil-II 12 



Symfoil-4P 57 
Symfoil-SG 57 
Symfoil-QG 57 
His-Symfoil-II 54 
Symfoil-II 54 



Symfoil-4P 99 
Symfoil-SG 99 
Symfoil-QG 99 
His-Symfoil-II 96 
Symfoil-II 96 



MHHHHH1 FNL 
MHHHHHE FNL 
MHHHHHF. FNL 
MHHHHHE 



F U'GN 
PPGN 
PPGN 
LVPR 



14 
14 
14 
11 
11 



YKKPVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQI SPE 56 

YKKPVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 56 

YKKPVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 56 

3QGEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 53 

GQG EVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 53 



GNGF.VLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 98 

CSGEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 98 

CQGEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 98 

lEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 95 

[EVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 95 



GNGEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQIS 
G S GEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 
CQGEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 
iEVLLKSTETGQYLRINPDGTVDGTRDRSDPHIQFQISPE 
EVLLKSTETGQYLRINPDGTVEGTRDRSDPHIQFQISPE 



143 
140 
140 
137 
137 



Figure 1 

Sequence alignment of Symfoil proteins. Mutated sites are boxed. 
The arrow indicates the thrombin cleavage site newly introduced in 
Symfoil-II. 

The resultant crude protein solutions were centrifuged at 
12000 x g for 20 min. The obtained supernatants were dialyzed 
against buffer A, and were applied to a HisTrap FF column 
(5 ml) (GE-Healthcare) equilibrated by buffer A. The column 
was washed by buffer A containing 20 mM imidazole and 
eluted by a step gradient of 250 mM imidazole. The eluted 
fractions were dialyzed against buffer A and passed through a 
HiTrap Heparin HP column (5 ml) (GE-Healthcare) equili- 
brated by buffer A to remove impurities. The flow-through 
fractions were collected, and loaded onto a ResourceQ 
column (3 ml) (GE-Healthcare) equilibrated by buffer A. 
Elution from the ResourceQ column was achieved by a linear 
gradient of NaCl. 

The histidine tag of His-Symfoil-II was further removed by 
trypsin (Wako Pure Chemical Industries, Japan), and purified 
by ResourceQ column chromatography to generate Symfoil- 
II. Extinction coefficients of E280nm (0.1%, 1 cm) of 3.8, 3.8, 
2.9 and 3.1 were used to calculate protein concentrations for 
the Symfoil-SG, Symfoil-QG, His-Symfoil-II and Symfoil-II, 
respectively. The final yield was about 15 mg from 1 1 of 
culture. 

2.3. Gel filtration 

To characterize the self-assembly of Symfoil-II, gel filtration 
chromatography using a Superdex 200 10/300GL column (GE 
Healthcare) was conducted. The column was equilibrated with 
50 mM potassium phosphate buffer (pH 7.5) containing 0.2 M 
NaCl. The molecular mass of eluted Symfoil-II was deter- 
mined by multi-angle laser light scattering (SEC-MALLS). 
Light-scattering analysis was performed using a miniDAWN 
detector (Wyatt Technologies). 

2.4. Crystallization 

The purified Symfoil proteins were dialyzed against 50 mM 
sodium phosphate buffer (pH 7.5) containing 100 mM NaCl, 



954 Motoyasu Adachi et a/. ■ An artificial protein with three complete sequence repeats 



/. Synchrotron Rad. (2013). 20, 953-957 



diffraction structural biology 



10 mM ammonium sulfate and 0.5 mM EDTA, and then 
concentrated to 20 mg ml - . Crystallization was performed by 
hanging-drop vapor diffusion in 0.1 M Tris-HCl buffer (pH7.0) 
containing 1.5-2.0 M ammonium sulfate as precipitant. Drops 
consisting of 2 ul protein solution and 2 pi mother liquor were 
equilibrated against 1 ml of reservoir solution at 293 K for 
one week. 

2.5. Data collection and refinement 

Diffraction data of crystals of Symfoil-QG and Symfoil-II 
were collected using synchrotron radiation sources (A = 
1.00 A) at beamlines in SPring-8 and KEK, Japan. The crystals 
were mounted using a nylon cryo-loop (Hampton Research) 
and were frozen in a liquid-nitrogen stream at 100 K. 
Diffraction data were collected and indexed, integrated and 
scaled using the HKL2000 software package (Otwinowski 
& Minor, 1997). A molecular replacement search for non- 
isomorphous space groups was carried out using the program 
Phaser from the CCP4 suite (McCoy et al, 2007; Winn et al, 
2011) and coordinates of Symfoil-4P de novo designed protein 
[Protein Data Bank (PDB) code 3o4d] as a search model. 
Model building and visualization was performed using the 
X-tal View molecular graphics software (McRee, 1992). The 
PHENIX software package (Zwart et al, 2008) and the 
program REFMAC (Murshudov et al, 2011) were used for 
refinement, in which 5% of the data in the reflection files were 
set aside for i? tree calculations. The ARPIwARP automated 
procedure was used to add solvent molecules (Lamzin & 
Wilson, 1993). Atomic models were drawn using the graphics 
program Pymol (DeLano, 2002). 

3. Results and discussion 

Symfoil-II was designed based on the crystal structure of 
Symfoil-4P to have more perfect sequence repeats as shown 
in Fig. 1. We first removed three NG (Asn-Gly) sequences by 
changing Asn58 and AsnlOO to Ser or Gin and by deleting 
the Glyl41-Asnl42-Glyl43 sequence to give a Symfoil-SG or 
Symfoil-QG, respectively, to protect from deamidation during 
the crystallization experiments. Then, we added a thrombin 
cut site and GQG sequence to the N-terminal of the first 
sequence repeat of Symfoil-QG to make His-Symfoil-II. After 
the removal of the N-terminal histidine tag of His-Symfoil-II, 
Symfoil-II will be expected to have three complete sequence 
repeats in one protein as shown in Fig. 1. 

Symfoil-SG, Symfoil-QG and His-Symfoil-II were prepared 
after expression using the E. coli expression system. The 
purity of Symfoil-SG, Symfoil-QG and His-Symfoil-II with 
N-terminal histidine tag and Symfoil-II without N-terminal 
histidine tag was confirmed by SDS-PAGE (Fig. 2). The SDS- 
PAGE showed that the molecular size of His-Symfoil-II (lane 
5 and 10) was slightly smaller than those of Symfoil-4P (lane 2 
and 7), Symfoil-QG (lane 3 and 8) and Symfoil-SG (lane 4 and 
9) because of the removal of the N-terminal YKK sequence. 
After removal of the histidine tag, Symfoil-II became smaller 
than other Symfoils as seen in lane 11 with heat treatment. 




1 2 3 4 5 6 7 8 9 10 11 12 

Figure 2 

SDS-PAGE analysis of the purified Symfoil proteins. From lanes 2 to 6, 
the sample is not boiled before loading. From lanes 7 to 11, the sample is 
boiled before loading. Symfoil-4P: lanes 2 and 7; Symfoil-SG: lanes 3 and 
8; Symfoil-QG: lanes 4 and 9; His-Symfoil-II: lanes 5 and 10; Symfoil-II: 
lanes 6 and 11. Protein size markers of Markl2 (Life Technologies) are 
shown in lanes 1 and 12. 

Without heat treatment, Symfoil-II looked extremely large 
(similar size to its dimer), suggesting that Symfoil-II might 
form a larger complex. To identify the actual molecular size of 
Symfoil-II, the molecular size was evaluated by gel filtration 
equipped with a multi-angle light-scattering detector. The 
molecular weight of Symfoil-II was, however, estimated to be 
14 x 10 3 , which is similar to the theoretical value (13932) for 
the monomeric Symfoil-II calculated from its primary struc- 
ture. Although the mechanism for the size shift seen in 
SDS PAGE is still unclear, stabilization of Symfoil-II against 
the denaturation by SDS may be a part of the reason why 
this band shifts. Further assay of the melting experiment is 
under way. 

Now, we obtained Symfoil-II with three complete sequence 
repeats. We next investigated the effect of the removal of the 
histidine tag and the Asn-Gly sequence on X-ray diffraction 
using three independent crystal forms of Symfoil-QG and two 
independent crystal forms of Symfoil-II. Symfoil-QG crystals 
diffracted to 2.0, 2.0 and 1.8 A resolution, respectively, 
whereas Symfoil-II crystals diffracted to 1.4 and 1.15 A reso- 
lution, respectively, as summarized in Table 2. Symfoil-II was 
crystallized into different space groups. The C2 space group 
was uniquely obtained in Symfoil-II and the crystal diffracted 
to 1.15 A resolution. The diffraction limit and also the Wilson 
fi-factor are shown in Table 2. This indicates that Symfoil-II 
diffracted better than the other symfoils with lower 5-values. 
These improved diffraction and lower 5-values in Symfoil-II 
may be caused by structural stabilization. The close location of 
the N-terminal and C-terminal in Symfoil-II may give a chance 
to form an ion pair and the electrostatic stabilization may be 
part of the reason for its stabilization. Evaluation of the 
stability of Symfoil-II with and without the histidine tag is the 
next subject to be investigated. 

Crystal structures of the Symfoil-QG and Symfoil-II were 
determined and the refinement statistics are summarized in 
Table 2. The overall structure of Symfoil-II is shown in 
Fig. 3(a). The RMS difference between the structures of 



/. Synchrotron Rad. (2013). 20, 953-957 



Motoyasu Adachi et al. ■ An artificial protein with three complete sequence repeats 955 



diffraction structural biology 



Table 2 

X-ray data collection and refinement statistics for Symfoil molecules. 
Values in parentheses are for the highest-resolution shell. 

Symfoil-QG Symfoil-QG Symfoil-QG Symfoil-II Symfoil-II 



Data collection 














Beamline 


SPring-8 BL38Bf 


PF BL17A 




SPring-8 BL38B1 


PF BL5A 


PF BL5A 


Space group 


7222 


C222i 




R3 


7222 


C2 


Unit-cell parameters 


a = 50.4, b = 53.0, 


a = 58.3, 6 = 6 


■6.5, 


a = 55.2f, b = 55.2t, 


a = 51.1, b = 53.2, 


a = 81.4, b = 47.9, 


(A, °) 


c = 84.8 


c = 66.6 




c = 125.6f 


c = 84.4 


c = 57.2 p = 133 


Resolution (outer 


27.7-2.00 (2.07-2.00) 


26.7-1.80 (1.8( 


5-1.80) 


26.2-2.00 (2.07-2.00) 


25.6-1.40 (1.45-1.40) 


41.8-1.15 (1.19-1.15) 


shell) (A) 














No. of observed 


48805 


60109 




42312 


145900 


235712 


reflections 














No. of unique 


7608 (605) 


11849 (1124) 




8819 (913) 


22274 (2133) 


55243 (5300) 


reflections 














Redundancy 


6.4 (5.8) 


5.1 (4.6) 




4.8 (4.3) 


6.6 (3.8) 


4.3 (3.8) 


Completeness (%) 


95.4 (78.4) 


96.2 (92.1) 




91.6 (95.8) 


96.3 (93.7) 


96.0 (93.2) 


I/cr(T) 


25.0 (2.8) 


24.9 (3.3) 




14.5 (2.6) 


42.5 (2.0) 


43.1 (3.0) 


Emerge 


0.109 (0.503) 


0.093 (0.415) 




0.134 (0.433) 


0.067 (0.601) 


0.049 (0.649) 


Wilson plot 


32.4 


20.4 




23.2 


18.2 


13.7 


5-factor (A 2 ) 














Refinement statistics 














Resolution (A) 


27.7-2.00 


26.7-1.80 




26.3-2.00 


25.6-1.40 


41.8-1.05 


No. of water 


68 


82 




127 


113 


199 


molecules 














R factor/R frcc 


0.212/0.305 


0.190/0.270 




0.221/0.307 


0.230/0.300 


0.146/0.176 


R.m.s.d. bonds (A) 


0.014 


0.018 




0.014 


0.021 


0.030 


R.m.s.d. angles (°) 


1.693 


2.025 




1.642 


2.424 


2.539 


Program 


REFMAC 


REFMAC 




REFMAC 


REFMAC 


PHENIX/REFMA C 



f Hexagonal obverse setting. 

Gly12-Glu53 

N 



Gly12-Glu53 




Gly54-Glu95 



Gly96-Glu137 Gly54-Glu95 
(«) 




Gly96-Glu137 



Gly14 




Val1 



Figure 3 

Structure of Symfoil-II in space group C2. (a) Overall structure of 
Symfoil-II represented by a ribbon model. The first repeat (residues 12- 
53 in Fig. 1) is colored in green, the second repeat (54-95) is colored in 
cyan and the third repeat (96-137) is colored in orange, (b) Structure of 
N- and C-terminal residues in Symfoil-II. The 2F a — F c electron density 
map is contoured at the I.Oct level. 



Symfoil-QG and Symfoil-II in the 7222 crystal form is 0.39 A, 
indicating that the structural difference caused by the removal 
of the N-terminal sequence is quite small. Location of the N- 
terminal histidine tag of Symfoil-QG was not determined in 
any crystals obtained in this study. Electron densities for the 
loop region connecting three repeats in Symfoil-QG and 
Symfoil-II were still invisible, but became clearer in the 
structure of Symfoil-II determined to 1.15 A resolution 
(Fig. 3b). Assuming that Symfoil-II with complete three 
sequence repeats has an ion pair at the N- and C-terminal, the 
structure is almost perfect threefold symmetry. RMS posi- 
tional differences after application of the rotation matrix 
calculated using the structures of each repeat of Symfoil-QG 
and Symfoil-II were less than 0.37 A for Symfoil-QG and less 
than 0.37 A for Symfoil-II, indicating that both Symfoils have 
threefold symmetry including the shape of the central cavity. 

In conclusion, we succeeded in preparation of artificial 
protein having three complete sequence repeats. Prepared 
Symfoil-II resulted in improving the X-ray diffraction and the 
structural details were figured out. We are now attempting to 
convert the ion pair of the N- and C-terminal with an amide 
bond to prepare a circular Symfoil, which may be useful as 
a scaffold to capture molecules having C3 symmetry (Gibson 
& Castaldi, 2006) by virtue of specific interaction with the 
threefold axes of symmetry present in Symfoil-II. 

We thank the staff at SPring-8 and Photon Factory. The 
synchrotron radiation experiments were performed at the 
BL38B1 beamline in SPring-8 with the approval of the Japan 
Synchrotron Radiation Research Institute (proposal No. 
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2010A1921) and at the BL5A and BL1A beamlines at the 
Photon Factory (proposal Nos. 11G088 and 13 G122). 
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